feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands by cx-ricardo-jesus · Pull Request #8011 · Checkmarx/kics

cx-ricardo-jesus · 2026-03-24T15:57:01Z

Reason for Proposed Changes

Currently, we are in the process of adding more fields to the positive_expected_result.json of each query and automating the process of manually putting the results of a KICS scan in the positive_expected_result.json file. This script now does that automatically. But the content this script writes to that file should be reviewed before committing the changes it makes.

Proposed Changes

Implemented a new script called positive_expected_result.json that does the work described above.
The first thing the script generate_positive_expected_result.py does is cal parse_args(), which sets up a CLI with two mutually exclusive modes:
- --run-all: Scan every query found under assets/queries/.
- --queryID + --queryPath. Scan a single specific query.
If --run-all is passed, iter_queries() is called, which walks the entire assets/queries directory tree. For every subdirectory that contains a metadata.json(a directory that contains a query, basically), it reads the "id" field and yields (query_id, query_path). The main goal of this is to give every query present in this repo. For each (query_id, query_path) pair, the function run_query_scans(query_id, query_path).
If the arguments queryID and queryPath are passed in the command line, the function run_query_scans(args.queryID, args.queryPath) is called directly for that one query.
The run_query_scans(query_id, query_path), has the job to discover all positive test files for the given query, run the appropriate KICS scans with the flags --experimental-queries, --bom, --enable_openapi-refs, --kcs_compute_new_simid, and then write the positive_expected_result.json output(s).
- Step 1: The first step is to discover the test files, and for that, it calls find_positive_tests(query_path), which looks inside <query_path>/test/. For every entry that starts with positive, it handles two layouts:
  - File layout (test/positiveX.<ext>): If the entry is a regular file (e.g., positive1.tf, positive2.yaml), it creates a PositiveTest object imported from models.py with:
    - label: positive<N>_<ext> (e.g., "positive1_tf")
    - scan_path: the path of the test file.
    - group: test - meaning results go to test/positive_expected_result.json.
  - Directory layout (test/positiveX/): if the entry is a subdirectory (e.g, positive2/), it iterates the files inside. For each file (e.g., positive2_1.tf):
    - label: same as above (e.g., positive2_1_tf)
    - scan_path: the path to the file inside the test subdirectory.
    - group: test/positive2 - meaning results go to test/positive2/positive_expected_result.json.
  - The extension is always included in the label so that files with the same base name but different extensions produce distinct labels and result files. All discovered tests are returned and sorted by their label using natural sort (so that positive2 comes before positive10).
- Step 2: Set up temporary directories: A single TemporaryDirectory is created for the entire query run, with the payloads/ and results/ subdirectories where KICS writes its payload files and JSON scan result files, respectively.
- Step 3: Choose scan strategy based on test layout:
  - After discovering the tests, run_query_scans checks if the query test directory has any subdirectory-based tests.
  - If no subdirectory tests are found (all test files in test/), the function runs two levels of scans:
    - Directory scan: Calls run_directory_scan(query_id, all_paths, ...) with every positive file at once.
    - Individual file scans (skipped for passwords_and_secrets): the function run_scan is called for each positive file separately.
    - Both approaches are used because, after several iterations of approaches in the script, in some situations, queries failed on unit tests because if the tests directory was scanned with all the test samples at once, it produces different results, and the scan command runs for each test file individually.
  - If the query has subdirectory tests: This handles queries that have both loose files (e.g., positive1.tf ) and subdirectory files (e.g., positive2/positive2_1.tf).:
    - Directory scan for loose files (if any): the same as mentioned above, but only for loose files (files that are not inside a test subdirectory).
    - For the subdirectory test files, all files inside that subdirectory are scanned together via run_directory_scan.
  - To run scans for all the test directory files at once, or inside a subdirectory with test files, the run_directory_scan function is used. The run_directory_scan, as mentioned before, runs a KICS scan command that targets all the files inside a test directory once. This is done inside a mirrored temporary directory under assets/queries in order for similarityIDs to match. passed to this function are assumed to share the same parent directory (always test/ or test//). It takes the parent of the first file as src_dir, then computes its path relative to assets/queries/ to know where to mirror it inside the temp directory. It takes the parent of the first file as src_dir, then computes its path relative to assets/queries/ to know where to mirror it inside the temp directory. After that, it iterates every positive file in the list and copies each one into the mirrored temp directory, preserving only the filename(not the full path). After that, all the positive files sit together inside target_dir, exactly as they do inside assets/queries/.../test. After that, there is another for loop whose objective is to copy every single file that does not start with positive or negative, that are auxiliary files such as certificates or others that the tests depend on. After this loop, the KICS CLI command is built with the temporary directory tmp_dir as the scan root and printed to stdout for traceability. After the CLI command is generated, the command runs as a subprocess. If the KICS scan exits with a code that is not in KICS_RESULTS_CODES, it prints an error.
  - To run scans for a single positive test file, the run_scan function is used. In this function, firstly, the path of the file relative to assets/queries is computed and stored in the rel_to_queries variable. For example, if scan_path is .../assets/queries/terraform/aws/s3/test/positive1.tf, then rel_to_queries becomes terraform/aws/s3/test/positive1.tf. Same as above, this relative path is what will be replicated inside the temp directory, so that the KICS engine computes the same similarityID as the unit tests do. After that as above, all the auxiliary files are copied using the _copy_auxiliary_files a after that is runs a KICS Scan command using the helper function _run_kics as above.
- Step 3: After all scans complete, the function collect_and_write_expected_results(query_path, results_dir, label_to_group) aggregates results and writes the final output files.
  - Firstly, it reads all result JSON files in results_dir. For each file, it looks up the label (filename without extension) in label_to_group to determine which group it belongs to (test or test/<dir>). It reads the data present inside queries and bill_of_materials (combined into all_findings variable), and for each finding extracts every file entry, converting it into an ExpectedResultEntry (defined in models.py).
  - On older versions of the script developed, the unit tests failed for the Passwords and Secrets query so, to fix this problem, the fix_secrets_query_names function was created. This function reads regex_rules.json, identifies which rule IDs appear more than once, compiles the regex pattern of each affected rule, and then re-matches each affected finding against those patterns using the actual line content from the source file. Once the correct rule is identified, , the entry's queryName is updated accordingly. This correction step ensures that the positive_expected_result.json for passwords_and_secrets reflects the true rule that triggered each finding, thereby preventing any errors in the unit tests.
  - After fixing passwords_and_secrets query names, within each group, entries are deduplicated using all fields from FIELD_ORDER variable, using a set of tuples to remove exact duplicates that can arise when the same finding appears in both the directory scan and individual scan results.
  - After this deduplication process, if subdirectory results exist but no loose file results were produced (edge case where test/ has only subdirectory positives), an empty test group is added. This ensures the unit tests always find a test/positive_expected_result.json to read.
  - After that, each group's entries are sorted using sort_key() from ExpectedResultEntry, which mirrors the order of vulnerabilityCompare Go function in test/queries_test.go. This ensures the written file's order matches exactly what the unit test produces when it sorts its actual findings, so comparisons are deterministic.
  - Finally, each group writes its entries as a JSON array to <query-path>/<group>/positive_expected_result.json.

Also added into the function getFilesMetadatasWithContent inside test/main_test.go to have the respective SubDocumentIndex value for each file, which is used in multidocs files in .yaml samples for some queries, mirroring the same logic used for the results produced by the unit tests inside the (*Service).sink() function in pkg/kics/sink.go file. This fixes the cases when there are samples tipically on .yaml formats that have multiple documents inside, producing different similarityIDs for CLI KICS scan commands and the results produced by the unit tests.

Added coverage for CNI files, which was already implemented before, in this PR, but possibly removed by mistake in this merge commit with the master branch. This enables KICS to detect CNI files.

Also changed the documentation, with information about the script and how to run it.

I submit this contribution under the Apache-2.0 license.

github-actions · 2026-03-24T15:57:56Z

KICS version: v2.1.20

	Category	Results
	CRITICAL	0
	HIGH	0
	MEDIUM	0
	LOW	0
	INFO	0
	TRACE	0
	TOTAL	0

Metric	Values
Files scanned	1
Files parsed	1
Files failed to scan	0
Total executed queries	47
Queries failed to execute	0
Execution time	0

…tests with the same name but different extension

…can output result

…sary comented code

…itive_expected_result-file

cx-ricardo-jesus requested a review from a team as a code owner March 24, 2026 15:57

cx-ricardo-jesus marked this pull request as draft March 30, 2026 12:15

cx-ricardo-jesus marked this pull request as ready for review March 30, 2026 14:05

cx-ricardo-jesus marked this pull request as draft March 30, 2026 14:06

cx-ricardo-jesus added 14 commits April 21, 2026 10:24

the script now runs a KICS scan command

b57b76f

updated script to fill positive_expected_result file

41817e3

reverted some changes

8d85234

added script

af381b6

changed KICS documentation

8348bc5

added --kics_compute_new_simid flag

0eecb5a

added --kics_compute_new_simid

a917f73

added 1 to result codes

80ff503

fixing script to take into account scenarios where there is multiple …

05d06a7

…tests with the same name but different extension

redefining all_findings variable

f06f5b3

changed queries-test file

8b114bd

last changes on script -> 15 fails

deec2c0

fixing script to properly handle sub directory in test query directory

4277188

positive_expected_result files filled by the script

2a9202a

cx-ricardo-jesus force-pushed the AST-137381--create-new-script-to-write-positive_expected_result-file branch from db10620 to 2a9202a Compare April 21, 2026 09:29

cx-ricardo-jesus added 3 commits April 21, 2026 10:35

added support for CNI files in the analyzer

93df241

adding SubDocumentIdx into FileMetadata structure to use it in kics s…

3824333

…can output result

added corrected results on positive_expected_result + removed unneces…

4f1f9f5

…sary comented code

cx-ricardo-jesus changed the title ~~feat(script): new script to fill positive_expected_result~~ feat(script): new script to fill positive_expected_result and fixing similarityID generation in scan commands Apr 22, 2026

cx-ricardo-jesus changed the title ~~feat(script): new script to fill positive_expected_result and fixing similarityID generation in scan commands~~ feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands Apr 22, 2026

Merge branch 'master' into AST-137381--create-new-script-to-write-pos…

25e738a

…itive_expected_result-file

cx-ricardo-jesus marked this pull request as ready for review April 22, 2026 11:58

cx-ricardo-jesus added 3 commits April 22, 2026 14:44

fix script function complexity - fix_secrects_query_names

0a49b93

fixing codacy issues

db0b197

Merge branch 'master' into AST-137381--create-new-script-to-write-pos…

37b5a69

…itive_expected_result-file

fixing codacy issues

a70af2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands#8011

feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands#8011
cx-ricardo-jesus wants to merge 22 commits intomasterfrom
AST-137381--create-new-script-to-write-positive_expected_result-file

cx-ricardo-jesus commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cx-ricardo-jesus commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cx-ricardo-jesus commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading